Statistically Efficient, Polynomial-Time Algorithms for Combinatorial Semi-Bandits

نویسندگان

چکیده

We consider combinatorial semi-bandits over a set of arms X \subset \0,1\ ^d where rewards are uncorrelated across items. For this problem, the algorithm ESCB yields smallest known regret bound R(T) = O( d (łn m)^2 T) / Δ_\min ) after T rounds, m \max_x \in 1^\top x. However, it has computational complexity O(|X|), which is typically exponential in d, and cannot be used large dimensions. propose first that both computationally statistically efficient for problem with asymptotic O(δ_T^-1 poly(d)), δ_T function vanishes arbitrarily slowly. Our approach involves carefully designing AESCB, an approximate version same guarantees. show that, whenever budgeted linear maximization can solved up to given approximation ratio, AESCB implementable polynomial time poly(d)) by repeatedly maximizing subject budget constraint, showing how solve these problems efficiently.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Efficient Ordered Combinatorial Semi-Bandits for Whole-Page Recommendation

Multi-Armed Bandit (MAB) framework has been successfully applied in many web applications. However, many complex real-world applications that involve multiple content recommendations cannot fit into the traditional MAB setting. To address this issue, we consider an ordered combinatorial semi-bandit problem where the learner recommends S actions from a base set of K actions, and displays the res...

متن کامل

Efficient Learning in Large-Scale Combinatorial Semi-Bandits

= ̃ O ⇣ K p dnmin {ln(L), d} ⌘ . (11) We now outline the proof of Theorem 3, which is based on (Russo & Van Roy, 2013; Dani et al., 2008). Let H t denote the “history” (i.e. all the available information) by the start of episode t. Note that from the Bayesian perspective, conditioning on H t , ✓⇤ and ✓ t are i.i.d. drawn from N( ̄ ✓ t ,⌃ t ) (see (Russo & Van Roy, 2013)). This is because that con...

متن کامل

Efficient Learning in Large-Scale Combinatorial Semi-Bandits

• the agent knows a generalization matrix Φ ∈ <L×d s.t. w̄ = EP [wt] is “close” to span[Φ] • such models are available in many cases Performance Metrics At each time t, choosing At ∈ A can be challenging, since the combinatorial optimization problem maxA∈A ∑ e∈A w(e) can be NP-hard. We assume the agent uses a combinatorial optimization algorithm ORACLE to choose At, where ORACLE can be an approx...

متن کامل

Thompson Sampling for Combinatorial Semi-Bandits

We study the application of the Thompson Sampling (TS) methodology to the stochastic combinatorial multi-armed bandit (CMAB) framework. We analyze the standard TS algorithm for the general CMAB, and obtain the first distributiondependent regret bound of O(m log T/∆min) for TS under general CMAB, where m is the number of arms, T is the time horizon, and ∆min is the minimum gap between the expect...

متن کامل

Bypassing Combinatorial Protections: Polynomial-Time Algorithms for Single-Peaked Electorates

For many election systems, bribery (and related) attacks have been shown NP-hard using constructions on combinatorially rich structures such as partitions and covers. This paper shows that for voters who follow the most central political-science model of electorates— single-peaked preferences—those hardness protections vanish. By using single-peaked preferences to simplify combinatorial coverin...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ACM on measurement and analysis of computing systems

سال: 2021

ISSN: ['2476-1249']

DOI: https://doi.org/10.1145/3447387